: UCB Algorithm and Adversarial Bandit Problem

نویسنده

  • Hossein Keshavarz
چکیده

. Review on stochastic multi-armed bandit problem Consider a gambler playing against a N−armed slot machine and sequentially pulls up arms to minimize the expected regret. Each arm, i ∈ {1, . . . ,N } has a probability distribution Di supported on the closed interval [0,1]. Let μi be the expected value of the corresponding loss to arm i and define Ĩ = argmin1≤j≤N μj as the index of the optimal expected value among arms. Suppose that the gambler selects arm It at round t, then the expected regret of the gambler up to round T is defined as

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem

ABSTRACT. In the stochastic multi-armed bandit problem we consider a modification of the UCB algorithm of Auer et al. [4]. For this modified algorithm we give an improved bound on the regret with respect to the optimal reward. While for the original UCB algorithm the regret in Karmed bandits after T trials is bounded by const · K log(T ) , where measures the distance between a suboptimal arm an...

متن کامل

Regional Multi-Armed Bandits

We consider a variant of the classic multiarmed bandit problem where the expected reward of each arm is a function of an unknown parameter. The arms are divided into different groups, each of which has a common parameter. Therefore, when the player selects an arm at each time slot, information of other arms in the same group is also revealed. This regional bandit model naturally bridges the non...

متن کامل

Cornering Stationary and Restless Mixing Bandits with Remix-UCB

We study the restless bandit problem where arms are associated with stationary φ-mixing processes and where rewards are therefore dependent: the question that arises from this setting is that of carefully recovering some independence by ‘ignoring’ the values of some rewards. As we shall see, the bandit problem we tackle requires us to address the exploration/exploitation/independence trade-off,...

متن کامل

Budget-Constrained Multi-Armed Bandits with Multiple Plays

We study the multi-armed bandit problem with multiple plays and a budget constraint for both the stochastic and the adversarial setting. At each round, exactly K out of N possible arms have to be played (with 1 ≤ K ≤ N ). In addition to observing the individual rewards for each arm played, the player also learns a vector of costs which has to be covered with an a-priori defined budget B. The ga...

متن کامل

Machine Learning Approaches for Interactive Verification

Interactive verification is a new problem, which is closely related to active learning, but aims to query as many positive instances as possible within some limited query budget. We point out the similarity between interactive verification and another machine learning problem called contextual bandit. The similarity allows us to design interactive verification approaches from existing contextua...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013